Project-Team:Maxplus

Inria | Raweb 2013 | Presentation of the Project-Team Maxplus | Maxplus Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: Research Program

Applications monotones et théorie de Perron-Frobenius non-linéaire, ou l'approche opératorielle du contrôle optimal et des jeux/Monotone maps and non-linear Perron-Frobenius theory, or the operator approach to optimal control and games

On sait depuis le tout début des travaux en décision markovienne que les opérateurs de la programmation dynamique $f$ de problèmes de contrôle optimal ou de jeux (à somme nulle et deux joueurs), avec critère additif, ont les propriétés suivantes :

\begin{matrix} monotonie / monotonicity & x \leq y & \Rightarrow & f (x) \leq f (y), \\ contraction / nonexpansiveness & {∥ f (x) - f (y) ∥}_{\infty} & \leq & {∥ x - y ∥}_{\infty} . \end{matrix}

(6)

Ici, l'opérateur $f$ est une application d'un certain espace de fonctions à valeurs réelles dans lui-même, $\leq$ désigne l'ordre partiel usuel, et ${∥ \cdot ∥}_{\infty}$ désigne la norme sup. Dans le cas le plus simple, l'ensemble des états est ${1, ..., n}$ et $f$ est une application de $ℝ^{n}$ dans lui-même. Les applications monotones qui sont contractantes pour la norme du sup peuvent être vues comme des généralisations non-linéaires des matrices sous-stochastiques. Une sous-classe utile, généralisant les matrices stochastiques, est formée des applications qui sont monotones et commutent avec l'addition d'une constante [91] (celles ci sont parfois appelées fonctions topicales). Les problèmes de programmation dynamique peuvent être traduits en termes d'opérateurs : l'équation de la programmation dynamique d'un problème de commande optimale à horizon fini s'écrit en effet $x (k) = f (x (k - 1))$ , où $x (k)$ est la fonction valeur en horizon $k$ et $x (0)$ est donné; la fonction valeur $y$ d'un problème à horizon infini (y compris le cas d'un problème d'arrêt optimal) vérifie $y = f (y)$ ; la fonction valeur $z$ d'un problème avec facteur d'actualisation $0 < α < 1$ vérifie $z = f (α z)$ , etc. Ce point de vue abstrait a été très fructueux, voir par exemple [52] . Il permet d'inclure la programmation dynamique dans la perspective plus large de la théorie de Perron-Frobenius non-linéaire, qui, depuis l'extension du théorème de Perron-Frobenius par Krein et Rutman, traite des applications non linéaires sur des cônes vérifiant des conditions de monotonie, de contraction ou d'homogénéité. Les problèmes auxquels on s'intéresse typiquement sont la structure de l'ensemble des points fixes de $f$ , le comportement asymptotique de $f^{k}$ , en particulier l'existence de la limite de $f^{k} (x) / k$ lorsque $k$ tends vers l'infini (afin d'obtenir le coût ergodique d'un problème de contrôle optimal ou de jeux), l'asymptotique plus précise de $f^{k}$ , à une normalisation près (afin d'obtenir le comportement précis de l'itération sur les valeurs), etc. Nous renvoyons le lecteur à [150] pour un panorama. Signalons que dans [110] ,[7] , des algorithmes inspirés de l'algorithme classique d'itérations sur les politiques du contrôle stochastique ont pu être introduits dans le cas des opérateurs monotones contractants généraux, en utilisant des résultats de structure de l'ensemble des points fixes de ces opérateurs. Les applications de la théorie des applications monotones contractantes ne se limitent pas au contrôle optimal et aux jeux. En particulier, on utilise la même classe d'applications dans la modélisation des systèmes à événements discrets, voir le § 3.5 ci-dessous, et une classe semblable d'applications en analyse statique de programmes, voir § 4.4 ci-dessous.

English version

Since the very beginning of Markov decision theory, it has been observed that dynamic programming operators $f$ arising in optimal control or (zero-sum, two player) game problems have Properties (6 ). Here, the operator $f$ is a self-map of a certain space of real valued functions, equipped with the standard ordering $\leq$ and with the sup-norm ${∥ \cdot ∥}_{\infty}$ . In the simplest case, the set of states is ${1, ..., n}$ , and $f$ is a self-map of $ℝ^{n}$ . Monotone maps that are nonexpansive in the sup norm may be thought of as nonlinear generalisations of substochastic matrices. A useful subclass, which generalises stochastic matrices, consists of those maps which are monotone and commute with the addition of a constant [91] (these maps are sometimes called topical functions). Dynamic programming problems can be translated in operator terms: the dynamic programming equation for a finite horizon problem can be written as $x (k) = f (x (k - 1))$ , where $x (k)$ is the value function in horizon $k$ and $x (0)$ is given; the value function $y$ of a problem with an infinite horizon (including the case of optimal stopping) satisfies $y = f (y)$ ; the value function $z$ of a problem with discount factor $0 < α < 1$ satisfies $z = f (α z)$ , etc. This abstract point of view has been very fruitful, see for instance [52] . It allows one to put dynamic programming in the wider perspective of nonlinear Perron-Frobenius theory, which, after the extension of the Perron-Frobenius theorem by Krein and Rutman, studies non-linear self-maps of cones, satisfying various monotonicity, nonexpansiveness, and homogeneity conditions. Typical problems of interests are the structure of the fixed point set of $f$ , the asymptotic behaviour of $f^{k}$ , including the existence of the limit of $f^{k} (x) / k$ as $k$ tends to infinity (which yields the ergodic cost in control or games problems), the finer asymptotic behaviour of $f^{k}$ , possibly up to a normalisation (which yields precise results on value iteration), etc. We shall not attempt to survey this theory here, and will only refer the reader to [150] for more background. In [110] ,[7] , algorithms inspired from the classical policy iterations algorithm of stochastic control have been introduced for general monotone nonexpansive operators, using structural results for the fixed point set of these operators. Applications of monotone or nonexpansive maps are not limited to optimal control and game theory. In particular, we also use the same class of maps as models of discrete event dynamics systems, see § 3.5 below, and we shall see in § 4.4 that related classes of maps are useful in the static analysis of computer programs.

Previous |

Home | Next next